Search CORE

141 research outputs found

Subset Quantile Normalization using Negative Control Features

Author: Wu Zhijin
Publication venue: Collection of Biostatistics Research Archive
Publication date: 26/06/2009
Field of study

Collection Of Biostatistics Research Archive

Prey capture and meat-eating by the wild colobus monkey _Rhinopithecus bieti_ in Yunnan, China

Author: Baoping Ren
Dayong Li
Hua Wu
Ming Li
Zhijin Liu
Publication venue
Publication date: 03/04/2009
Field of study

If it is true that extant primates evolved from an insectivorous ancestor, then primate entomophagy would be a primitive trait. Many taxa, however, have undergone a dietary shift from entomophagy to phytophagy, evolving a specialised gut and dentition and becoming exclusive herbivores. The exclusively herbivorous taxa are the Malagasy families Indriidae and Lepilemuridae, and the Old World Monkey subfamily Colobinae, and among these meat-eating has not been observed except as an anomaly, with the sole exception of the Hanuman langur (_Semnopithecus entellus_), which feeds on insects seasonally, and a single observation of a nestling bird predated by wild Sichuan snub-nosed monkeys (_Rhinopithecus roxellana_). Here, we describe the regular capture of warm-blooded animals and the eating of meat by a colobine, the critically endangered Yunnan snub-nosed monkey (_Rhinopithecus bieti_). This monkey engages in scavenge hunting as a male-biased activity that may, in fact, be related to group structure and spatial spread. In this context, meat-eating can be regarded as an energy/nutrient maximization feeding strategy rather than as a consequence of any special characteristic of meat itself. The finding of meat-eating in forest-dwelling primates might provide new insights into the evolution of dietary habits in early humans

Nature Precedings

A Statistical Framework for the Analysis of Microarray Probe-Level Data

Author: Irizarry Rafael A
Wu Zhijin
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/03/2005
Field of study

Microarrays are an example of the powerful high through-put genomics tools that are revolutionizing the measurement of biological systems. In this and other technologies, a number of critical steps are required to convert the raw measures into the data relied upon by biologists and clinicians. These data manipulations, referred to as preprocessing, have enormous influence on the quality of the ultimate measurements and studies that rely upon them. Many researchers have previously demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. However, further substantial improvements are possible. Microarrays are now being used to measure diverse high genomic endpoints including yeast mutant representations, the presence of SNPs, presence of deletions/insertions, and protein binding sites by chromatin immunoprecipitation (known as ChIP-chip). In each case, the genomic units of measurement are relatively short DNA molecules referred to as probes. Without appropriate understanding of the bias and variance of these measurements, biological inferences based upon probe analysis will be compromised. Standard operating procedure for microarray researchers is to use preprocessed data as the starting point for the statistical analyses that produce reported results. This has prevented many researchers from carefully considering their choice of preprocessing methodology. Furthermore, the fact that the preprocessing step greatly affects the stochastic properties of the final statistical summaries is ignored. In this paper we propose a statistical framework that permits the integration of preprocessing into the standard statistical analysis flow of microarray data. We demonstrate its usefulness by applying the idea in three different applications of the technology

Collection Of Biostatistics Research Archive

A statistical framework for the analysis of microarray probe-level data

Author: Irizarry Rafael A.
Wu Zhijin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/12/2007
Field of study

In microarray technology, a number of critical steps are required to convert the raw measurements into the data relied upon by biologists and clinicians. These data manipulations, referred to as preprocessing, influence the quality of the ultimate measurements and studies that rely upon them. Standard operating procedure for microarray researchers is to use preprocessed data as the starting point for the statistical analyses that produce reported results. This has prevented many researchers from carefully considering their choice of preprocessing methodology. Furthermore, the fact that the preprocessing step affects the stochastic properties of the final statistical summaries is often ignored. In this paper we propose a statistical framework that permits the integration of preprocessing into the standard statistical analysis flow of microarray data. This general framework is relevant in many microarray platforms and motivates targeted analysis methods for specific applications. We demonstrate its usefulness by applying the idea in three different applications of the technology.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS116 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Stochastic Models Based on Molecular Hybridization Theory for Short Oligonucleotide Microarrays

Author: Irizarry Rafael A
LeBlanc Richard
Wu Zhijin
Publication venue: Collection of Biostatistics Research Archive
Publication date: 29/09/2003
Field of study

High density oligonucleotide expression arrays are a widely used tool for the measurement of gene expression on a large scale. Affymetrix GeneChip arrays appear to dominate this market. These arrays use short oligonucleotides to probe for genes in an RNA sample. Due to optical noise, non-specific hybridization, probe-specific effects, and measurement error, ad-hoc measures of expression, that summarize probe intensities, can lead to imprecise and inaccurate results. Various researchers have demonstrated that expression measures based on simple statistical models can provide great improvements over the ad-hoc procedure offered by Affymetrix. Recently, physical models based on molecular hybridization theory, have been proposed as useful tools for prediction of, for example, non-specific hybridization. These physical models show great potential in terms of improving existing expression measures. In this paper we demonstrate that the system producing the measured intensities is too complex to be fully described with these relatively simple physical models and we propose empirically motivated stochastic models that compliment the above mentioned molecular hybridization theory to provide a comprehensive description of the data. We discuss how the proposed model can be used to obtain improved measures of expression useful for the data analysts

Collection Of Biostatistics Research Archive

FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET

Author: Cope Leslie
Irizarry Rafael A
Wu Zhijin
Publication venue: Collection of Biostatistics Research Archive
Publication date: 17/03/2006
Field of study

We describe why the Choe et al. control dataset should not be used to assess GeneChip expression measures

Collection Of Biostatistics Research Archive

Removing technical variability in RNA-seq data using conditional quantile normalization

Author: Hansen Kasper D.
Irizarry Rafael A.
WU Zhijin
Publication venue: Oxford University Press
Publication date: 24/05/2011
Field of study

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade's worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show that RNA-seq data demonstrate unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find guanine-cytosine content (GC-content) has a strong sample-specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here, we describe a statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content and quantile normalization to correct for global distortions

PubMed Central

Collection Of Biostatistics Research Archive

Comparison of Affymetrix GeneChip Expression Measures

Author: Irizarry Rafael A
Jaffee Harris A.
Wu Zhijin
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/09/2005
Field of study

Affymetrix GeneChip expression array technology has become a standard tool in medical science and basic biology research. In this system, preprocessing occurs before one obtains expression level measurements. Because the number of competing preprocessing methods was large and growing, in the summer of 2003 we developed a benchmark to help users of the technology identify the best method for their application. In conjunction with the release of a Bioconductor R package (affycomp), a webtool was made available for developers of preprocessing methods to submit them to a benchmark for comparison. There have now been over 30 methods compared via the webtool. Results: Background correction, one of the main steps in preprocessing, has the largest effect on performance. In particular, background correction appears to improve accuracy but, in general, worsen precision. The benchmark results put this balance in perspective. Furthermore, we have improved some of the original benchmark metrics to provide more detailed information regarding accuracy and precision. A handful of methods stand out as maintaining a useful balance. The affycomp package, now version 1.5.2, continues to be available as part of the Bioconductor project (http://www.bioconductor.org). The webtool continues to be available at http://affycomp.biostat.jhsph.edu

Collection Of Biostatistics Research Archive